Relative dynamic skew compensation of parallel data lines

ABSTRACT

A system performs a two-step skew compensation procedure by first correcting for any phase error alignment between a parallel link clock and data signal edges of each data channel, thereby allowing the received data bits to be correctly sampled. Then, a second step is performed to “word-align” the bits into the original format, which is accomplished with a Skew Synchronizing Marker (SSM) byte in a data FIFO of each data channel. The SSM byte is transmitted on each data channel and terminates the skew compensation procedure. When the SSM byte is detected by logic in the data FIFO of each data channel, the data FIFO employs the SSM byte to initialize the read and write pointers to properly align the output data.

FIELD OF THE INVENTION

The present invention relates to high-speed data links and, moreparticularly, to high-speed parallel data links. Specifically, oneembodiment of the present invention provides a system to deskew ahigh-speed parallel data link.

BACKGROUND OF THE INVENTION

As the speed and performance of digital systems increase, demands oninterconnects that “link” these systems also increase. “Links” arecommunications paths between systems, sub-systems, and componentsenabling them to exchange data. Digital data can be transferred aspulses of electrical energy over electrically conductive material suchas metal wires. An alternate technique for conveying digital data is bypulses of light over optic fiber.

Traditionally, serial line protocols employing encoded clock and datahave been the protocols of choice for long-haul applications such as WAN(Wide Area Network), LAN (Local Area Network), SAN (Storage AreaNetwork), as well as other proprietary links. This dominance includesboth wire and fiber optic networks. A trend has also emerged atbandwidths of 2.5 Gb/s (current base line) and 10 Gb/s (projected toarrive within a few years) with companies that manufacture servers androuters as well as other high-speed digital systems vendors having begunto adopt serial line protocols as their high-speed backplaneinterconnect to implement their systems. This approach is being adoptedfor raising the overall throughput of digital systems.

The physical implementation of serial protocols of encoded clock anddata is based on PLL (Phase Locked Loop) architectures to first recoverthe encoded clock and then employ the clock to sample received data.Critical design factors for a typical PLL are the LPF (Low Pass Filter)and VCO (Voltage Controlled Oscillator) that are area intensive fortheir implementation and guard-band or “keep-out” region.

The overall bandwidth of a serial link is determined by thecharacteristics of the interconnect medium and the PLL's ability toaccurately recover the encoded clock. Hence, the bandwidth of a seriallink cannot be increased easily from the original implementation,requiring a significant re-architecting and design effort.

Alternatively, parallel line protocols utilize like wires or opticfibers that simultaneously transfer a number of bits of digital dataequal to the number of wires or optic fiber channels used, called“words.” Ideally, all bits pertaining to a particular word arrive at theintended destination simultaneously and are sampled on the occurrence ofthe next available clock edge. In practice, however, this is typicallynot the case for high-speed parallel data links. That is, due tovariations in the materials used to construct either wire or optic fiberlinks, as well as variations in fabrication process, the propagationdelay or speed of the digital signals comprising the bits will varyslightly among wires or optic fibers. This results in differences inarrival times of the bits, referred to as “signal skew” or simply“skew.” Wire skew and skew in optic fibers (skew contribution of wire oroptic fiber) is proportional to the physical length of the path includedin the parallel link. As the amount of skew between the lines of aparallel link increases, the skew further reduces the amount of bit“overlap” observed at the link's destination, thereby increasing thelikelihood of a data sampling error. As a result, parallel links withouta means of compensating for skew typically tolerate a total (both lineand circuit) skew budget of less than 20% of the nominal bit time. Thislimits the operational distance and bandwidth of most parallel links toless than 10 meters in cable running at approximately 400 Mb/s and 0.15meters at 1.0 Gb/s using the more common backplane fabricationmaterials.

Compensation techniques for skew in parallel links are known. Oneapproach to compensate for skew was developed for the ANSIHIPPI-6400-PHY standard by Silicon Graphics, Inc. located in MountainView, Calif. As disclosed in U.S. Pat. No. 6,031,847, a trainingsequence is used to measure the amount of skew between each of theparallel channels, that includes clock, data, frame, and control bits.The training sequence is comprised of four sub-sequences: 1) preamble,2) flush sequence, 3) ping sequence, and 4) post-amble. Assuming thatthe leading edge of the ping sequence of all channels is aligned at thetime of launch from the source, any difference in arrival times at thedestination represents the amount of skew present among the channels.Based on measurements, additional delay is added to each channel tore-establish the alignment of the received bits. Once edge alignment hasbeen re-established, the clock signal is further delayed to center itsedges with respect to the center of the data bits to more accuratelyrecover received data.

Considered in more detail, FIG. 1 shows a timing diagram of the trainingsequence developed for the HIPPI-6400 ANSI standard. Shown is thetraining sequence having four sub-sequences consisting of a preamblecommand, flush, ping, and post-amble. The training sequence is“balanced” with zero and one assertion times being equal. An importantfeature is that the length of the flush and ping sequences must match,and the length of each sequence must exceed the total skew for the linkto be compensated.

Additionally, FIG. 2 shows a block diagram for a known data channelarchitecture having a conventional delay line, namely, a SuMAC datachannel. The SuMAC DSCC's (Dynamic Skew Compensation Circuit) datachannel consists of input measurement control circuitry and logic, aninverter chain, measurement latch, tap decode logic, and tap-select “OR”tree. The length of the inverter chain and all other associatedstructures must be equal to or larger than the skew to be compensated.

There are three key attributes of this architecture that must beconsidered when extending the “skew range.” A first attribute to beconsidered is that the training sequence used in this scheme createsflush and ping sub-sequences during which the switching activity on theparallel link goes to zero for a period of time. This period ofswitching inactivity introduces a short-term drift in the DC balance ofcopper cable and ambient light level of fiber optic links. Short-termdrift of both DC balance and ambient light produces a “start-up”uncertainty or “jitter” phenomenon when switching activity resumes onthe link. The “start-up jitter” phenomenon occurs at the critical flushand ping sub-sequence boundary, thereby affecting the accuracy of theskew measurement. Hence, when the training sequence is modified toaccommodate a greater skew range, the flush and ping sub-sequences mustbe increased, which creates a larger imbalance in the DC level of acopper cable link or ambient light level for a fiber optic link.Ultimately, this can reduce the maximum bandwidth obtainable due tomeasurement error caused by larger amounts of “start-up jitter” beingintroduced into the skew measurement.

A second attribute that must be considered is the number of channelsthat can be practically constructed. As shown in FIG. 2, an internalsignal, “All_Present,” generated by the logical “AND” of the individualping signals received by all of the channels must be re-distributed toall of the channels of the parallel bus in order to capture the skewvalues. As the number of channels increases, the logical “AND” functionmust also increase, thereby requiring more time to complete thisfunction. Hence, the overall size of all of the channels delays thearrival time of the “All_Present” signal, requiring the delay line to belengthened and the overall size of the delay stack to be increased,resulting in higher power dissipation.

A third attribute concerns the overall length of the delay that governsthe total amount skew that can be handled. Hence, to increase the skewrange, the length of a delay line comprising the hardware and overallsize of the delay stack must be increased proportionately. As a result,the delay line is more difficult to design, and the increased size alsoincreases the power that is dissipated.

It would therefore be desirable to effectively correct for skew in aparallel link at higher bandwidths to assure that data is accuratelysampled. It would also be desirable to enable skew to be corrected usinghardware that does not require a long delay line or increase powerconsumption. Additionally, it would be desirable to provide a skewcorrection architecture that is scalable to avoid having to redesign thehardware from one application to another.

SUMMARY OF THE INVENTION

The present invention provides a method and apparatus for relativedynamic skew compensation of parallel data lines. One embodiment of thepresent invention provides a system that performs a two-step skewcompensation procedure referred to as a training sequence by firstcorrecting for any phase error alignment between the parallel link clockand data signal edges of each data channel, thereby allowing thereceived data bits to be correctly sampled. Then, a second step isperformed to “word-align” the bits into the original format, which isaccomplished with an SSM (Skew Synchronizing Marker) byte in a data FIFOof each data channel. The SSM byte is transmitted on each data channeland terminates the training sequence. When the SSM byte is detected bylogic in the data FIFO of each data channel, the data FIFO employs theSSM byte to initialize the read and write pointers to properly align theoutput data.

In accordance with various embodiments of the present invention, a linkis comprised of source and destination nodes with an interconnect mediumconstructed of copper cable or optic fiber, for example. At the linksource node, an SSD (Source Synchronous Driver) formats “M” bits ofinput data received from the core logic and drives “M” data channelsonto the physical link along with a link clock. The “M” data bits andlink clock are received at the link destination node by a Dynamic SkewCompensation (DSC) architectural block that compensates for skew,re-centers the link clock edge relative to the bits of data, and outputs“M” bits of data.

A novel delay line architecture referred to as a“Multiple-Input-Single-Exit” delay line, or MISX-DL, provides a variablemulti-tap delay line configuration in which the delayed signal isextracted from the last tap of the delay line and the input signal isintroduced at 1-of-m “injection points” along its length. The desireddelay is achieved by introducing the input signal at a selectedinjection point relative to the last tap of the delay line, therebyeliminating the need for a conventional tap-select multiplexer circuit.As a direct result of eliminating the multiplexer circuit, a reductionin the overall size, delay latency, and switching power is achieved.Additionally, with the incorporation of a signal phase splitter circuit,the delay line architecture can easily support tap resolutions based onan inverter delay with a modest increase in the number of transistorsrequired, thereby effectively doubling the resolution.

Further, a novel data FIFO architecture is provided based on an “OrderedPopulation Count” sequence with Gray-code characteristics along with amathematical procedure and logic implementation, that eliminates theneed for conversion and significantly reduces comparison latencies. Thearchitecture supports a latch-based implementation to accommodatehigh-speed applications.

Also, a gated clock provides a latch-based technique for controlling thestarting and stopping of a high-speed clock signal while utilizingre-generative feedback to prevent clock glitch. A further extensionprovides a means of producing a divide-by-n clock signal that remainssynchronous with a similarly generated reference base clock signal. Thelink width is comprised of one or more parallel data channels and linkclock that together form a “bundle.” At the link destination node, thereceiving bundle consists of one or more DSC Modules having the same ormixed sizes and a Bundle Interface Module (BIM) for bundles containingtwo or more DSC Modules.

Each DSC Module preferably includes a single clock channel and one ormore data channels. The clock channel receives the link clock signal,generates a data-recovery link clock, and distributes the data-recoverylink clock along with the standard link clock signal to all datachannels within the DSC Module. Each data channel of a DSC Modulereceives a link data signal, phase-corrects, and captures data using thedata-recovery link clock signal. Recovered data is then word-aligned inthe data FIFO for that data channel before being presented as outputdata to the core logic.

The clock channel of each DSC Module is preferably comprised of threesub-blocks, including a clock channel front end, BIST (Built-InSelf-Test) block, and utility block. The clock channel front-end blockgenerates the data-recovery link clock signal and distributes thissignal to all data channels within the DSC Module. The BIST blockprovides a means for testing critical functionality of the entire DSCModule and reporting self-test status. The utility block provideslogical functions to coordinate the interface between a cold trainingsequence, warm training sequence, and DSC and core logic and the dataFIFO read pointer control across the bundle.

Each data channel of a DSC Module is preferably comprised of threesub-blocks, including a data channel front end, data FIFO, and utilityblock. The data channel front-end block functions to phase-correct thedata and clock signal edges during the phase correction sub-sequence ofthe cold or warm training sequences. Once phase correction has beencompleted, the data-recovery link clock signal from the clock channel isemployed to sample the phase-corrected data. The data FIFO blockreceives positive and negative phase-aligned data from the data channelfront end and stores this data in a FIFO framed on the prescribed byteboundaries. The data FIFO block is also used to detect commands toinitiate a warm training sequence and the SSM byte used to initializethe write pointer frame counter and start the data FIFO read pointers inthe bundle for both cold training sequence and warn training sequenceoperations. The utility block of the data channel performs two primaryfunctions, namely, 1) data FIFO read pointer control and coordinationand 2) diagnostic register control and interface.

The Bundle Interface Module (BIM) distributes, re-times, and logicallycombines broadcast signals between all DSC Modules in the bundle. TheBIM also functions to combine broadcast module status signals from allDSC Modules within the bundle to interface to the core logic.

The cold training sequence (CTS) preferably consists of a clock channelCTS initiation sequence, delay trimming sequence, phase correctionsequence, and a Skew Synchronizing Marker (SSM) byte transmitted on alldata channels to perform an initial training sequence. The CTS protocolprovides a method for forcing a DSC Module training sequence after asystem power-up or reset or an unrecoverable link error.

An extended training sequence (ETS) consists of an ETS command sequence,delay-trimming sequence, phase correction sequence, and an SSM bytetransmitted on all data channels to perform an initial trainingsequence. ETS commands are detected while preventing false trainingsequences due to single and multi-bit errors across all data channels ofthe parallel data link.

The warm training sequence (WTS) consists of a WTS command sequence,phase correction sequence, and an SSM byte transmitted on all datachannels to perform a periodic training sequence. WTS commands aredetected while preventing false training sequences due to single andmulti-bit errors across all data channels of the parallel data link.

The system in accordance with the present invention effectivelycompensates for relative deskew on a parallel link. The hardwareimplementation of the deskewing circuit of the present inventionincludes a delay line having a reduced parts count and reduced powerconsumption, and the deskewing circuit is process independent and isscalable.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a timing diagram of a known training sequence in accordancewith the HIPPI-6400 training protocol.

FIG. 2 is a block diagram for a known data channel architecture having aconventional delay line.

FIG. 3 is a high-level block diagram of a scalable architecture for afull parallel link in accordance with one embodiment of the presentinvention having a transmitter bundle node comprised of SSD modules, aphysical interconnect of either optic fiber or wire, and receiver bundlenode comprised of DSC Modules interconnected via a “Bundle InterfaceModule” or (BIM).

FIG. 4 is a high-level block diagram of a DSC bundle included in thearchitecture shown in FIG. 3 illustrating major signal ports.

FIG. 5 is a detailed block diagram of the DSC bundle shown in FIG. 4comprising a plurality of DSC Modules, for example, four modules, and aBIM with hierarchical signals.

FIG. 6 is a block diagram of a four-channel DSC Module (DSC4) includingfour data channels, a clock channel, and hierarchical signals.

FIG. 7 is a detailed block diagram of the DSC clock channel shown inFIG. 6 having a clock channel front-end logic and circuitry block,Built-In Self-Test (BIST) block, and utility block to interface to thedata channel(s) within the DSC Module and to either the BIM or host corelogic.

FIG. 8 is a detailed functional block diagram of the DSC clock channelfront-end logic and circuitry shown in FIG. 7 and hierarchical signals.

FIG. 9 is a detailed block diagram of an exemplary DSC data channelshown in FIG. 6 having a data channel front-end logic and circuitryblock, data FIFO block, and utility block to interface to the clockchannel within the DSC Module and to either the BIM or host core logic.

FIG. 10 is a detailed functional block diagram of the DSC data channelfront-end logic and circuitry shown in FIG. 9 and hierarchical signals.

FIG. 11 is a detailed functional block diagram of the data FIFO blockshown in FIG. 9 comprising a data FIFO register file, read pointer,pattern search sequencer, search logic, SSM start sequencer logic, framebit counter, write pointer, and glue logic gates.

FIG. 12A is a detailed block diagram of the Bundle Interface Module(BIM) shown in FIG. 5 for an exemplary four-module signal interconnectand logic circuit for signals operating at link speeds. As shown in FIG.12A, the BIM has a pattern generator resident at DSC Module port 2 asthe source and Bundle Timing Module (BTM) masters at the destinationnodes of DSC Modules 0, 1, and 3 ports. The signal from the patterngenerator is buffered and fed onto interconnecting metal wires to theother module ports, with distributed re-buffering circuits along thesignal path length. The BTM drives the bundle logic for the DSC Moduleports.

FIG. 12B is a detailed block diagram of an alternative embodiment of theBundle Interface Module (BIM) similar to the embodiment shown in FIG.12A but with the pattern generator resident at DSC Module port 3.

FIG. 13A is a detailed block diagram of the Bundle Interface Module(BIM) shown in FIG. 5 for the exemplary case of a four-module signalinterconnect and logic circuit for signals operating at core speeds. Asshown in FIG. 1 3A, the source resides at DSC Module port 2, and bundlelogic resides at the destination nodes of DSC Modules 0, 1, and 3 ports.The signal from the source node is buffer-driven onto interconnectingmetal wires to the other module ports, with distributed re-bufferingcircuits along the signal path length. At the destination nodes, thesignal is received by the bundle logic for the DSC Module ports.

FIG. 13B is a detailed block diagram of an alternative embodiment of theBundle Interface Module (BIM) similar to the embodiment shown in FIG. 13A but with the source resident at DSC Module port 3.

FIG. 14 is a block diagram of a logic representation of BIM logic forsignals not requiring re-timing and synchronization. These signals arereferenced to the slower core clock frequency.

FIG. 15 is a block diagram of an interface between the bundle logic andthe core logic and illustrates the hierarchical interface signalsbetween the DSC bundle and the core logic of a host ASIC.

FIG. 16 is a block diagram of the link clock strapping showing anexemplary implementation for the distribution link clock signalutilizing the “clock-strapping” method. Also shown is a DSC Module ClockStrapping Strip module on the “link end” of each DSC Module consistingof an input buffer that drives a fan-out buffer of three. A Module ClockInput Buffer functions to receive the clock signal and drive that signalinto the clock channel of the DSC Module. The link clock signal isconnected to the fan-out input pin only. First-level fan-out link clocksignals from DSC Module 0 are connected to the fan-out ping pins of DSCModules 1 and 2 using metal segments of equal dimensions. A first set ofsecond-level fan-out link clock signals from DSC Module 1 connects tothe Module Clock Input Buffer pin of DSC Modules 1 and 0. A second setof second-level fan-out link clock signals from DSC Module 1 connects tothe Module Clock Input Buffer pin of DSC Modules 2 and 3. Theinterconnect metal for this second level of fan-out link clock signalsis of equal dimensions.

FIG. 17 is a top-level skew range timing diagram showing the warmtraining sequence traffic across four data channels 0, i, j, and k.Depicted is the maximum skew range allowed of greater than four bytes(e.g., a minimum of 40UI for an 8b10b encoding scheme and 32UI fornon-AC-encoded protocols) contained within the command detect segment.Segments also depicted are the Clock Calibration, Data Channel PhaseAdjust, Clock Re-start, and word framing segments. System dataimmediately follows the training sequence on all four data channels.

FIG. 18 is a timing diagram of the data channel's measurement period ofthe data-training pattern (data_train_pattern) referenced to thedata-recovery link clock (drc_lnk_clk). Also shown is the capturedData-Phase-Image-Word view of the logical transposed representation andhow the Data Image Word actually appears.

FIG. 19 is a timing diagram showing the timing relationship of thedata-recovery link clock (drc_lnk_clk) to the standard link clock(std_lnk_clk) used throughout the deskewing circuit.

FIG. 20 is a timing diagram for the SSM byte showing the preceding fencepattern of the training sequence, the SSM byte consisting of the SSMpattern, and post-amble followed by system data.

FIG. 21 is a state flow diagram of the SSM pattern search algorithm usedduring typical training sequences to detect the SSM pattern. A firstphase of the search algorithm is to detect the training fence pattern,which is started after a waiting period of two frame_inc cycles. If nofence pattern is detected, the search terminates and escapes to an errorstate, and a phase_error signal is set. If the fence pattern isdetected, the algorithm begins searching for the SSM pattern. An SSMerror is assumed if neither a fence nor SSM pattern has been detectedfor one cycle of the frame_inc signal, causing the algorithm to exit toan error state setting the ssm_err_flg signal. If an SSM pattern isdetected, the algorithm escapes to the “Train Ok” state asserting theflag signal ssm_found. The algorithm remains in the “Train Ok”, “Phs AdjError”, or “SSM Error” state while signal train_mode remains asserted.The algorithm exits to the “Shutdown” state saving all flag states anddeactivating the clock to the SSM search sequence logic.

FIG. 22 is a state flow diagram of the command pattern search algorithmused for warm training and extended training command bytes. Thisparticular algorithm searches for four consecutive warm training bytesor four consecutive extended training bytes. It is designed to exit iffour consecutive command bytes are not detected and resume the searchfrom the beginning.

FIG. 23 is a timing diagram showing the warm training sequence (WTS)consisting of the a train command, having a length of four bytes,training fence pattern, and SSM byte followed by system data. Also shownis the link clock signal waveform.

FIG. 24A is a timing diagram showing one of two possible cold trainingsequences (CTS) initiated by halting the link clock forcing thelink_alive signal inactive. Prior to re-starting the link clock, thedata training fence pattern is transmitted on the data channel.Re-establishing the link clock causes the link_alive signal to beasserted a short number of clocks thereafter allowing the training ofthe DSC Module to continue. The training fence pattern is terminated bythe SSM byte. System data immediately follows the end of the coldtraining sequence.

FIG. 24B is timing diagram showing the second possible cold trainingsequence initiated by halting the link clock, forcing the link_alivesignal inactive. Re-establishing the link clock has the signallink_alive asserted a short number of clocks thereafter, allowing thetraining of the DSC Module to continue. Prior to the assertion of thelink_alive signal, the data training fence pattern is transmitted on thedata channel. The training fence pattern is terminated by the SSM byte.System data immediately follows the end of the cold training sequence.

FIG. 25 is the top-level block diagram of the data channel utility blockshowing data channel daisy-chain gates, the FIFO control, and channelregister control sub-blocks. The FIFO control generates FIFO statussignals my_fifo_empty_o, my_fifo_full_o, my_data_valid_o, andmy_fifo_error_o that are daisy-chained with status signals for otherdata channels within the DSC module. It also receives broadcast signalsmy_fifo_empty_i, bim_fifo_full_i, and bim_data_valid_i and generates ard_cnt_enb signal that controls the read pointer of the data FIFO.

FIG. 26 is the top-level block diagram of the clock channel utilityblock with logic to implement the clock channel training detection,module-level FIFO control, module-level FIFO control logic blocks, andclock detection circuitry. Also shown are logic gates representing gluelogic that generates intermediate signals used by the logic blocks.

FIG. 27 is a top-level block diagram of the module-level FIFO Controlblock with logic gates combining the left and right segments of receiveddaisy chained FIFO status signals to generate the FIFO status signals ofthat DSC Module. A block representing the module-level FIFO controllogic receiving FIFO status signals from other DSC Modules within thebundle via the BIM is also shown.

FIG. 28 is a top-level representation of the distributed daisy-AND anddaisy-OR nets used to interconnect data channels of a DSC Module to theclock channel. A network is shown having both a left and right segmentthat terminate at the utility block of the clock channel. In the utilityblock of the clock channel, a logic gate of the same function combinesthe left and right segments of the daisy-AND or daisy-OR nets.

FIG. 29 is a flow chart of the modified binary search algorithmimplemented to determine the sub-tap delay used during the delaytrimming process.

FIG. 30 shows a modified binary search algorithm to determine thetrim_c2d_sel vector which will reduce the residual phase error to avalue of less than one 1/n of a tap delay. The “n” represents theresolution of the sub-tap delay value.

FIG. 31 is a top-level block diagram representation of a bundle of fourDSC Modules connecting to a BIM and depicting the interconnectingsignaling hierarchy.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

In accordance with the various embodiments of the present invention andreferring now to the figures, wherein like reference numerals identifylike elements of the various embodiments of the invention, one caneffectively perform relative dynamic skew compensation of parallel datalines. Additionally, the hardware implementation of the deskewingcircuit in accordance with the present invention includes a delay linehaving a reduced parts count and reduced power consumption, and thedeskewing circuit is process independent and is scalable.

A preferred embodiment of the deskewing system in accordance with thepresent invention is shown in FIG. 3. Generally, the link architectureand operation are as follows.

At a source node, a transmitter 300 comprised of one or more SourceSynchronous Driver (SSD) Modules 310-0, 310-1, . . . 310-N drives dataonto physical media, for example, optic fiber or copper ribbon cable,forming a parallel bus 320 consisting of a plurality of data channels 0,1, . . . M with data edge-aligned to either the leading or trailing edgeof a link clock. As clock and data signals propagate over the copper oroptic fiber media, the signals on each channel experience propagationphenomena that are unique and uncorrelated to other signals on theparallel bus, resulting in different arrival times at a destination nodewhere a receiver 340 resides. This difference in arrival times isreferred to as “skew.”

In order to successfully sample the received data, the skew between alldata channels 0, 1, . . . M in the parallel bus 320 must be eliminatedor significantly reduced and the link clock edge substantially centeredwithin the data bit time period. Adjustments to compensate forchannel-to-channel skew and centering of the link clock edge areperformed by DSC Modules 350-0, 350-1, . . . 350-N during a coldtraining sequence (CTS), extended training sequence (ETS), or warmtraining sequence (WTS) operation, as will be described in more detaillater. Both ETS and WTS operations assume that the link has beenproperly compensated and the link clock edge centered during an earlierCTS.

During normal operation, the primary functions of the DSC Modules 350-0,350-1, . . . 350-N are the control of a data FIFO across a bundle andthe detection of extended and warm training command sequences across thebundle. Data FIFO control, in particular, its read pointer across thebundle, is accomplished with a two-level distributed control scheme witha first level of daisy-chained signals within a DSC Module 350 andsecond-level broadcast status signals at the bundle level. This providesa means of scaling the coordination of the data FIFO across the bundle.

Normal operation of the data FIFO is as follows. As shown in FIG. 11, awrite-pointer-address vector wr_pntr_adr of the data FIFO block issampled on each rising edge of the core clock and synchronized to thecore clock domain by the FIFO-Status-Logic block, as shown in FIG. 25.The Read Pointer 1110 address vector rd_pntr_adr is also sampled on eachrising edge of the core clock and processed with the synchronizedwr_pntr_adr. The read and write FIFO addresses are compared, and theFIFO status is determined, namely, state of data, FIFO full or emptystate, and any error condition.

At each data channel the status of the associated data FIFO is processedwith the status of the other data FIFO blocks to generate daisy-chainedstatus signals: my_data_valid_o, my_fifo_full_o, my_fifo_empty_o, andmy_fifo_error_o, as shown in FIG. 25. The status signals from thedaisy-chained segments are combined at the utility block for a DSC ClockChannel 600 shown in FIG. 6 to create module bundle signals out,including bn/rx_data_valid, bn/rx_fifo_full, bn/rx_fifo_empty, andbn/rx_error shown in FIG. 15, that are broadcast to all DSC Modules inthe bundle via the BIM 360 shown in FIG. 3. The distributed statussignals are further processed at the exit port of the BIM 360 to each ofthe DSC Modules 350-0, 350-1, . . . 350-N and are passed to the DSCModules as bim_data_valid, bim_fifo_full, bim_fifo_empty, andbim_fifo_error shown in FIG. 14.

The distributed FIFO status signals bim_bundle signals in shown in FIG.6 from the BIM 360 are received by the DSC Clock Channel 600 anddistributed to each of the DSC Data Channels 610-0, 610-1, 610-2,610-3,. . . . Depending on the DSC-to-core interface desired, control logic ofthe DSC Data Channels 610-0, 610-1, 610-2, 610-3, . . . receives part orall of the broadcast status signals and controls the rd_cnt_enb signalto the Read Pointer 1110 shown in FIG. 11 to advance or hold theRead-Pointer address.

ETS and WTS command searches are another primary function of the DSCModules 350-0, 350-1, . . . 350-N. Detection of command sequences mustensure against “false” or “split” training scenarios in which a portionof the links in a bundle enters training while other links continue innormal operation.

Considered in more detail, during normal operation the command detect isas follows. The transmitter 300 shown in FIG. 3 initiates an extendedtraining sequence (ETS) or warm training sequence (WTS) with thetransmission of respective ETS or WTS command byte sequences. As shownin FIG. 11, a Pattern Search FSM block 1120 performs both commandsequence patterns and SSM-byte pattern searches during normal andtraining mode operations, respectively, on recovered data temporarilystored in the Data FIFO Register File 1170. With a signal train_mode inits non-asserted state, the Pattern Search FSM block 1120 searches fortraining command sequences. Signals frame_inc and byte_marker from aFrame Bit Counter 1160 are used by the Pattern Search FSM block 1120 toframe received command bytes. The Pattern Search FSM block 1120 assertsa signal train_det1_flg, train_det2_flg, or cmd_err_flg for a WTScommand sequence, ETS command sequence, or for an error conditiondetected during the command sequences, respectively, and passed to theutility block. As shown in FIG. 25, signals train_det1_flg,train_det2_flg, or cmd_err_flg are each incorporated into incomingdaisy-chained signals my_train_det1_i, my_train_det2_i, my_cmd_err_i,and my_train_det_i from the previous data channel before being passed asmy_train_det1_o, my_train_det2_o, my_cmd_err_0, and my_train_det_o tothe next data channel or terminating at the utility block of the clockchannel. Signals my_train_det1_o and my_train_det2_o are the result ofthe daisy-AND function of my_train_det1_i, my_train_det2_i,train_det1_flg, and train_det2_flg. Signal my_cmd_err_i is the result ofthe daisy-OR function of my_cmd_err_i with cmd_err_flg, whilemy_train_det_o is the result of the daisy-OR function of signalsmy_train_det1_i, my_train_det2_i with my_train_det_i. Signalmy_train_det_o is the daisy-OR function of my_train_det_i with thelogic-OR of signals train_det1_flg with train_det2_flg of each datachannel FIFO block. Signal my_train_det_o is used to inform the clockchannel that at least one data channel within the DSC Module hasdetected a command sequence and initiated either an extended or warmtraining sequence. For the case of a command sequence error condition,the Pattern Search FSM block 1120 shown in FIG. 11 resumes the commandsequence search.

As shown in FIG. 28, left and right segments of signals my_train_det1_o,my_train_det2_o, my_cmd_err_o, and my_train_det_o terminate at theutility block of the Clock Channel 600 and are the products of therespective daisy-AND and daisy-OR functions of all the data channels ofa DSC Module 350. Logic in the utility block operates on these signalsreceived from the DSC Data Channels 610-0, 610-1, 610-2, 610-3, . . .and generates signals clk_train_strt (for example clk_train_strt1 andclk_train_strt2 shown in FIG. 26). Signal clk_train_strt1 orclk_train_strt2 is asserted if and only if my_train_det1_o ormy_train_det2_o respectively is asserted and signals one_ssm_det_flg andmy_cmd_err_o are not asserted (for example, clk_train_strt1 is assertedif and only if my_train_det1_o is asserted and my_cmd_err_o andone_ssm_det_o are not asserted). Signals clk_train_strt1 andclk_train_strt2 shown in FIG. 7 are used to initiate a warm trainingsequence (WTS) or extended training sequence (ETS) respectively in theclock channel only after all data channels in the DSC Module 350 havesuccessfully detected and completed their training sequences.

In the case where an error has been detected in the command sequence,the Pattern Search FSM 1120 in the data channel FIFO block asserts thecmd_err_flg signal and re-initiates its search for a training commandsequence. Assertion of the cmd_err_flg signal in a data channel islogically-ORed with an incoming daisy-chained signal my_cmd_err_iproducing signal my_cmd_err_o that is propagated to the next datachannel eventually terminating at the utility block of the clockchannel. The assertion of my_cmd_err_o prevents the assertion of eitherthe elk_train_strt1 or elk_train_strt2 signal and the clock channel fromexecuting a training sequence.

In a second error condition excessive skew beyond specified limit isexperienced by at least one data channel causing that data channel toenter its training sequence much later than other data channels in thebundle. The effect is that the clk_train_strt1 and clk_train_strt2signals shown in FIG. 7 that start the clock channel training sequenceare significantly delayed. If the one_ssm_det_flg signal shown in FIG.26 is asserted prior to the assertion of the elk_train_strt signals, theclock channel aborts its training sequence.

The assertion of the ssm_found_flg signal by the Patter Search FSM 1120of the data channel FIFO block is incorporated in the daisy-AND tree ofsignals my_all_ssm_found (for example, my_all_ssm_found_i andmy_all_ssm_found_o shown in FIG. 25) and the daisy-OR tree of signalsmy_one_ssm_det (for example, my_one_ssm_det_i with my_one_ssm_det_oshown in FIG. 25) with all of these signals terminating at the utilityblock of the clock channel. The clock channel train detect logic assertssignals clk_train_strt1 shown in FIG. 7 if and only if one_ssm_det_flg(shown in FIG. 26) and cmd_err_flg (shown in FIG. 25) signals are notasserted.

A general description of the warm training sequence (WTS) will now beprovided. A warm training sequence is both a sub-operation of the morecomprehensive cold training sequence (CTS) and extended trainingsequence (ETS) operations, and can be an independent operation, as well.The purpose of WTS operation is to adjust for environmental conditionsand the effects of those conditions on the operation of the entire link.The frequency of WTS or ETS must be sufficient to compensate forenvironmentally induced skew effects. This frequency of training ishighly dependent on the environment and the application. Of the threetraining types, WTS is the most common and requires the least amount oftime to adjust for any shift in the skew profiles.

WTS operations are initiated at the source node with the SSD Modules310-0, 310-1, . . . 310-N transmitting a WTS command on all link datachannels 320. The WTS command has a minimum number of consecutivecommand bytes (typically four bytes), that has a unique pattern notfound in the normal data stream. In accordance with one modification ofthe present invention, alternative methods include adding additionalchannels to identify a training sequence, or a “byte count” may also beused to initiate a WTS command. An “error-free” minimal length or bytecount for the WTS command is required to assure that random bit errorscannot initiate a false or spontaneous training sequence.

A description of the clock channel operation during a warm trainingsequence will now be described with reference to FIGS. 7 and 8. During awarm training sequence, the front end of the DSC Clock Channel 600determines the clock offset value required for a data-recovery linkclock, drc_lnk_clk, to center its edges within the data bit. Measuringthe period of the link clock signal and dividing this value by a factorof four produces a preferred offset value of one-quarter lambda. Duringthis period, a phs_adj_done signal is deactivated for a period of timeafter the assertion of a signal train_strt1 while the drc_lnk_clkassumes the newly acquired offset value.

Considered in more detail, in the Clock Channel front-end block 700, theassertion of train_strt1 deactivates the phs_adj_done signal whileenabling a clock signal std_lnk_clk to its state machine 860. Signalphs_adj_done is broadcast as mod_clk_adj_done to all DSC Data Channels610-0, 610-1, 610-2, 610-3, . . . of the DSC Module 350. When asserted,the phs_adj_done (mod_clk_adj_done) signals permit the DSC Data Channelfront ends to complete their training sequences. Once activated, thestate machine asserts its meas_strt signal to create a meas_pulse signalat the input of a Delay Line 840. On the next clock cycle, a signalmeas_cntl is asserted, capturing the clock period measurement in theClock Image Latch 800 shown in FIG. 8 and terminating meas_pulse oneclock cycle later, returning to the normal link clock waveform. Theclock image word is an N-bit word representing the clock period of thereceived link clock signal in terms of delay taps or simply “taps”. A“one-hot” decode of the clock image word is performed by Image DecodeLogic 810 which locates the tap of the first 1-to-0 transition,representing the period of the link clock signal. A String-to-BinaryEncoder circuit 820 converts the “one-hot” tap location to a binaryvalue and generates a tap_decode vector. The tap_decode vector output ofthe String-to-Binary Encoder circuit 820 is a binary value of the linkclock period that is processed further by a DNA logic block 830 togenerate the final tap_adr signals. The DNA logic block 830 performs aset of arithmetic operations on the tap_decode vector to arrive at thefinal tap_adr value. As shown in FIG. 29A, in a first and secondoperation the binary value of the Clock-Image-Latch distribution delayis subtracted from the tap_decode vector, and then the adjusted value isright-shifted two positions to derive a divide-by-four value. In a thirdoperation a balance delay value is added to results of thedivide-by-four operation to produce the final tap_adr value. The balancedelay value is required to align the edges of the drc_lnk_clk andstd_lnk_clk clock signals. The clock channel completes its portion ofthe WTS operation with the assertion of a phs_adj_done signal(mod_clk_adj_done) shown in FIG. 8. Signal phs_adj_done(mod_clk_adj_done) is asserted a sufficient number of clock cycles aftermeas_cntl assertion to assure that all DSC Data Channels 610-0, 610-1,610-2, 610-3, . . . have enough time to restart their data recoverystring to detect the SSM byte.

For purposes of testing, the final tap_adr value can be further modifiedby adding or subtracting a registered offset value, acsr_clock_ch_offset vector value, and/or changing the value of thecsr_lambda_scaler vector shown in FIG. 8 and FIG. 29A to change thedefault divide-by-four factor to another predefined value. Adivide-by-four factor is the preferred csr_lambda_scaler value thatcenters the clock edge in the bit period while minimizing sampling andedge placement calculation errors.

A detailed description of the data channel operation during a warmtraining sequence will now be provided. In the DSC Data Channelfront-end 910, the assertion of train_strt1 deactivates local signalphs_adj_done and enables clock signal std_lnk_clk_b to its state machine1000 shown in FIG. 10. Deactivation of phs_adj_done forces signalspos_data_bit and neg_data_bit to a zero value. Once activated, the DataChannel FSM 1000 sets the value of vectors mode_sel and trim_c2d_sel andasserts a signal reflow_conf_en. The value of the mode_sel vectorselects the lnk_data_in signal of the data channel input multiplexerwhile the trim_c2d_sel vector selects the minimum sub-tap “padding”delay. The assertion of the signal reflow_conf_en enables the “re-flow”measurement configuration. The “re-flow” measurement configurationcauses the signal applied the “event input” of the Delay Line 1070 to berouted the to the output of the delay line without passing through thedelay stages of the delay line. A second signal applied to the “echoinput” is routed to pass through the delay stages of the Delay Line, butis blocked from the output of the delay line. Hence, a delay image ofthe signal applied to the “echo input” can be recorded while a signalapplied to the “event input” simultaneously passes through the input andoutput circuitry of the Delay Line 1070 bypassing the delay stages.

A signal meas_en is asserted by the FSM 1000 capturing a data phase wordin a Data Image Latch 1010 on the next rising edge of the link clockstd_lnk_clk. It is important to note that the capture of a data phaseword is referenced to the link clock std_lnk_clk that representsdrc_lnk_clk with zero offset. Hence, the data phase word represents thephase relationship of the data stream to the data-recovery link clockdrc_lnk_clk with an offset value of zero. The data image word is anN-bit word representing the phase error of the received link data signalin terms of delay taps or simply “taps”. A one-hot” decode of the dataimage word is performed by Image Decode Logic 1020 which locates the tapposition of the first 1-to-0 transition, when data_image_latch[0] equalsone, or the 0-to-1 transition, when data_image_latch[0] equals zero. AString-to-Binary Encoder circuit 1030 converts the “one-hot” taplocation to a binary value and generates the tap decode vector shown inFIG. 10. The FSM 1000 saves the initial tap decode vector in a DNAregister and initiates a modified binary search algorithm as shown inFIG. 30 to determine the trim_c2d_sel vector which will reduce theresidual phase error to a value of less than one 1/n of a tap delay. The“n” represents the resolution of the sub-tap delay value.

The tap_decode vector output of the String-to-Binary Encoder circuit1030 is a binary value of the link data phase error that is processedfurther by a DNA logic block 1040 to generate the final tap_adr signals,as shown in FIG. 10. For purposes of testing, the final tap_adr valuecan be further modified by adding or subtracting a registered offsetvalue, a csr_data_ch_offset vector. The DNA logic block 1040 performs aset of arithmetic operations on the tap_decode vector to arrive at thefinal tap_adr value. As shown in FIG. 29B, in a first and secondoperation the binary value of the Data-Image-Latch distribution delay issubtracted the from the tap_decode vector, and the balance delay valuethat is required to align the edges of the drc_lnk_clk clock andphs_adj_data signals is added to arrive at the final tap_adr value. Thesignal phs_adj_done is asserted a sufficient number of clock cyclesafter meas_en is asserted to assure that phase-corrected data haspropagated through a Data Recovery String 1060. When all data channelswithin the DSC Module 350 have completed their phase correctionsequence, the DSC Clock Channel 610 deactivates the mod_clk_adj donesignal to begin is training sequence. When deactivated, signalmod_clk_adj done deactivates the phs_adj_done signals for all datachannels within the DSC Module 350. When the clock channel has completedits training sequence, it asserts signal mod_clk_adj_done allowing thedata channels to assert their phs_adj_done signals. Signalmod_clk_adj_done is asserted with sufficient time for the data channelsto assert their phs_adj_done signals to initialize the Data RecoveryString 1060. Assertion of phs_adj_done signals indicates the end of thedata channel phase correction sequence and enables data output signalsneg_data_bit and pos_data_bit to the Data FIFO block 900 shown in FIG.9.

Referring again to FIG. 11, in the Data FIFO block 900, the assertion oftrain_strt1 resets an SSM Start Sequencer 1130 and activates itsstd_lnk_clk. Reset of the SSM Start Sequencer 1130 also deactivates asignal train_mode, forcing a Write Pointer 1140 to a zero value. Hence,during a WTS operation, only Data FIFO Register File 1170 location zerois selected.

The neg_data_bit and pos_data_bit streams are written into the Data FIFOstack 1170 on the rising edge of std_lnk_clk. The Data FIFO stack 1170is preferably implemented as an addressable “write-pre-bit” FIFO tominimize power dissipation. The Data FIFO write address, fifo_bit_adr,from a FIFO Address Encoder 1150 is derived from the Write Pointer 1140and the Frame Bit Counter logic block 1160 along with the state of asignal pos_pattrn_flg. The pos_pattrn_flg signal from the Pattern SearchFSM 1120 is a logical one if the positive bits of the Data FIFO stack1170 all have a value of logical one, otherwise signal pos_pattrn_flg isa logical zero. The FIFO read and write order is altered depending onthe state of pos_pattrn_flg to maintain proper byte framing due tonegative phase alignment of the data stream from the DSC Data Channelfront-end block 910. The fifo_bit_adr is updated with the state ofpos_pattrn_flg upon the deactivation of the train_mode signal during theSSM byte.

With the assertion of train_mode and phs_adj_done, the Pattern SearchFSM 1120 begins SSM and training fence pattern searches. During normaloperation, signal phs_adj_done will be deactivated when the clockchannel enters into its training sequence and reactivated once the clockchannel has completed its training. During the period of phs_adj_donesignal deactivation the Pattern Search FSM 1120 suspends its search forthe SSM byte and training fence pattern and resumes its search oncephs_adj_done has been asserted. If after two transitions of frame_incthe training fence pattern has not been detected, a signal ssm_err_flgis asserted. Waiting two frame_inc transitions assures that the DSC DataChannel front-end block 910 has had time to phase-adjust the data streamto be properly sampled. The SSM search continues, and a signalssm_found_flg is asserted once the SSM pattern has been detected. If anon-training fence pattern or non-SSM patterns are detected during thistime, the ssm_err_flg signal is asserted, ending the SSM search. Thestates of the ssm_found_flg and ssm_err_flg are maintained upon thedeactivation of the train_mode signal.

The states of the ssm_found_flg and ssm_err_flg signals are fed to theDSC Clock Channel 600 and used internally by the DSC Data Channels610-0, 610-1, 610-2, 610-3 . . . , as well. The DSC Clock Channel 600processes these signals and broadcasts the results to all DSC DataChannels 610-0, 610-1, 610-2, 610-3 . . . within the DSC Module 350, aswell as to other DSC Modules within the bundle. Internal to the DSC DataChannel 610, these signals are used by the SSM Start Sequencer logic1130 shown in FIG. I 1 that sets the proper byte framing.

The SSM Start Sequencer 1130 aligns and initializes the Write Pointer1140 and Frame Bit Counter 1160 to properly frame the data byte. Thisprocedure assures that the first data bit received after the warmtraining sequence is written to Data FIFO stack 1170 location zero. Uponthe assertion of the ssm_found_flg signal, the SSM Start Sequencer 1130asserts a frame_init signal, while deactivating a frame_strt signal, asshown in FIG. 11. The frame_init signal forces the Frame Bit Counter1160 to a zero value to generate a fifo_bit_adr vector of zero at whichthe first data bit will be stored. The Write Pointer 1140 and Frame BitCounter 1160 are held at this initial state until the SSM StartSequencer 1130 asserts frame_strt two UI from the end of the SSM byte tostart the Frame Bit Counter 1160. The SSM Start Sequencer 1130deactivates the train_mode signal at the end of the SSM byte. In thecase of an SSM error, an ssm_err_flg signal is asserted and used as partof a process to observe error-framed data where the DSC Data Channel 610outputs data though it is incorrectly framed, allowing the system todiagnose the error condition.

Logic to generate the fifo_bit_adr vector is comprised of the WritePointer 1140, Frame Bit Counter 1160, and FIFO Address Encoder 1150logic blocks. As mentioned earlier, the Data FIFO stack 1170 ispreferably implemented as a “write-per-bit” where each bit of data iswritten to a particular bit location pointed to by the fifo_bit_adrvector. The frame_adr and wr_pntr_adr vectors are used by the FIFOAddress Encoder 1150 logic to generate the fifo_bit_adr vector. TheFrame Bit Counter 1160 is sized for two-byte count intervals, whichaligns double frames on an integer number of link clock cycles. Althoughnot required for fifo_bit_adr vector generation, the write pointer issized to match the read pointer, which reduces the logic needed for DataFIFO status calculations. The least significant portion (or LSB) of thewr_pntr_adr vector is not used by the FIFO Address Encoder 1150 togenerate the fifo_bit_adr vector, thus requiring the write pointer to beincremented twice. This requires the Frame Bit Counter 1160 to assert aframe_inc signal to increment the Write Pointer 1140 value on each bytetime interval. It should be noted that intermediate increment values ofthe write pointer do not affect the fifo_bit_adr vector, but they allowfor read-to-write pointer tracking. Configuration of the Write Pointer1140 and Frame Bit Counter 1160 in this way accommodates an odd-numberedencoding scheme (for example, 8b9b) having reduced design complexity.

The DSC Clock Channel 600 receives ssm_found_flg and ssm_err_flg signalsfrom all DSC Data Channels 610-0,610-1, 610-2,610-3, . . . within theDSC Module 350, as shown in FIG. 11. As shown in FIG. 26, abim_all_ssm_found signal is the logical “AND” of all ssm_found_flgsignals from the DSC Data Channels 610 and is synchronized to the coreclock domain before it is broadcast to all DSC Modules 350-0, 350-1, . .. 350-N within the bundle via the BIM 360. A bim_ssm_err_flg signal isthe logical “OR” of all ssm_err_flg signals from the DSC Data Channels610 and is synchronized to the core clock domain before it is broadcastto all DSC Modules 350 within the bundle via the BIM 360. Each DSCModule 350 in the bundle receives the broadcast signals bim_ssm_err_flgand bim_all_ssm_found via the export port of the BIM, as shown in FIG.14. The module-level FIFO Control in the utility block of the DSC ClockChannel 600 shown in FIG. 27 receives these signals and generates amod_rd_cnt_strt signal that is broadcast to all DSC Data Channels 610 inthe DSC Module 350. The mod_rd_cnt_strt signal is the logical “OR” ofbim_ssm_err_flg and bim_all_ssm_found signals. The assertion of themod_rd_cnt_strt signal marks the end of the warm training sequence andthe start of normal operation.

The cold training sequence (CTS), which provides a method to initiate aDSC training sequence without the requirement of correctly sampling linkdata, will now be described in detail. Cold training sequences arerequired at system power-up, system reset, or in the case ofunrecoverable link errors. Cold training sequences are a super-set ofwarm training sequence (WTS) operations having a first segmentpertaining to delay path matching/calibration and a second segment oflink skew compensation, as performed during a warm training sequence.CTS operations function as a link reset causing the DSC Modules 350-0,350-1, . . . 350-N to reset and clear all calibration values obtainedduring previous training sequences and precipitate an immediatere-train. A cold training sequence is initiated by stopping the linkclock switching by holding the link clock signal at either a logical oneor zero state for a minimum specified amount of time to assure detectionby all DSC Modules 350-0, 350-1, . . . 350-N. CTS operations can beinitiated at any time. In accordance with one modification of thepresent invention, alternative methods include slowing the link clockfrequency or providing a non-fence pattern on the link clock channel.Any of these methods produces a detectable “link-reset signature” toinitiate CTS operation. However, doing so invalidates any received databeing processed by the DSC Modules 350-0, 350-1, . . . 350-N at thetime.

Once CTS operation has started, the DSC Modules 350-0, 350-1, . . .350-N begin path delay matching/calibration, hereafter referred to as“trimming”. Preferably, trimming is first completed in the DSC ClockChannels 600 followed by all Data Channels 610. Only after the trimmingprocess has completed is skew compensation of the link initiated.

DSC Clock Channel 600 delay trimming is a two-stage process. The firststage provides delay compensation for a Clock Image Latch 800distribution, and the second stage is path-delay matching of clocksignals drc_lnk_clk (data-recovery link clock) and std_lnk_clk (standardlink clock). Similar operations occur in the DSC Data Channels 610.First, delay compensation for the Data Image Latch 1010 distribution isperformed, and, second, the path-delay matching of phase-adjusted datato the data-recovery link clock drc_lnk_clk is performed. In practice,the Data Image Latch 1010 distribution delay trimming for both clock andDSC Data Channels 610 is performed concurrently to reduce the timerequired for training. The delay-path match trimming between the clocksignals drc_lnk_clk and std_lnk_clk must be completed in the DSC ClockChannel 600 before the DSC Data Channels 610 complete the trimmingprocess. In all cases, trimmed delays are required to achieve an optimalaccuracy of less than one-eighth of one tap delay. Once all trimmingoperations have completed, the previously described normal warm trainingsequence operation is initiated as the final operational phase of theCTS.

Considered in more detail, the DSC Clock Channel 600 shown in FIG. 6operates as follows during a cold training sequence (CTS). A coldtraining sequence is initiated when the lnk_clock_in signal is held ateither logic state (for example, logical high or logical low) for aspecified period of time. As shown in FIGS. 24A and 24B, thelnk_clock_in signal is monitored by a Clock Detect circuit as shown inFIG. 26 residing in the DSC Clock Channel 600 utility block and assertsreset_lclk shown in FIG. 8 and deactivates link_alive when no switchingactivity on lnk_clock_in has been detected for a specified period oftime. This state is referred to as “link reset”. Exit from a link resetoccurs with resumed switching activity of link clock when all DSCModules 350-0, 350-1, . . . 350-N within the bundle automatically enteran extended training sequence. At this time, the training pattern is andmust be present at each lnk_data_in pin of each DSC Data Channel 610shown in FIG. 5.

Once lnk_clock_in switching activity resumes, the reset_lclk signal isdeactivated, and link_alive is asserted synchronously a minimum numberof clock cycles thereafter. As shown in FIG. 10, the assertion ofreset_lclk sets a signal train_strt2 and deactivates phs_adj_done(mod_clk_adj_done) to all DSC Data Channels 610-0, 610-1, 610-2, 610-3,. . . in the DSC Module 350. The reset_lclk signal assertion alsoactivates link clock std_lnk_clk_b to all of the FSM circuits 1000 inthe clock and data channels. The assertion of the train_strt2 signallaunches the extended training sequence (ETS) in the clock. In the datachannels, assertion of reset_lclk sets local signal train_strt2, whilethe deactivation of phs_adj_done (mod_clk_adj_done) prevents the DSCData Channels 610 from completing their clock-to-data-path delaytrimming process.

In the DSC Clock Channel 600, the distribution delay of the Clock ImageLatch 800 begins with its finite state machine or FSM 860 setting themode_sel[m:0] and trim_LDD[2:0] vectors, as shown in FIG. 8. Themode_sel[m:0] vector is set to select the meas_cntl input of the DSCClock Channel 600 input multiplexer, while the trim_LDD[2:0] vector isset to its minimum value. After a required minimum number of link-clockcycles, the meas_cntl signals are asserted, launching a measurement pingpulse down the Delay Line 840, while activating the Clock Image Latch800 to capture the initial distribution delay image. A dna_latch_ensignal is asserted by the FSM 860 to save the binary-encoded value inthe DNA logic latch 830. The FSM 860 begins the process to refine thedistribution delay image by deactivating the meas_cntl signals andincrementing the trim_LDD vector value to increase the latchdistribution delay by approximately one-eighth of one tap delay. Themeas_cntl signal is asserted once again to capture the incrementeddistribution delay value. A match_LDD signal is asserted when thebinary-encoded value of the Clock Image Latch 800 equals the perviouslatched value of the latch distribution delay image. The FSM 860monitors the state of the match_LDD signal and performs one of threefunctions. First, if the match_LDD signal remains asserted and thetrim_LDD vector does not equal its maximum value, the trim_LDD vectorvalue is incremented, and the distribution delay image is recaptured.Second, if the match_LDD signal is deactivated, the dna_latch_en signalis cycled to save the most recent delay-image value of the distributiondelay image and proceeds to the next phase of the trimming procedure.Third, if signal match_LDD remains asserted and the trim_LDD vectorequals its maximum value, the FSM 860 resets the trim_LDD vector to zeroand proceeds to the next phase of the trimming procedure.

The matching of the data-recovery link clock to the standard link clockpath delay is the final sequence of the trimming process of the DCSClock Channel 600 extended cold training sequence. As shown in FIG. 8,matching the propagation delay delta between the drc_lnk_clk and thestd_lnk_clk clock signal paths requires a gross delay adjustmentrealized in the Delay Line 840 of the DSC Clock Channel 600. Fine delayadjustments to the drc_lnk_clk signal path are realized by a Trim DelayCircuit 850 used to insert delay.

As shown in FIG. 8, the FSM 860 first sets the mode_sel vector to selectthe std_lnk_clk input of the DSC Clock Channel 600 input multiplexer andthe force_tap vector to zero selecting tap “zero” of the DSC ClockChannel Delay Line 840. The trimL2_sel[2:0] vector to the Trim DelayCircuit 850 is also set to a zero value by the FSM 860 in this segmentof the training sequence. The FSM 860 asserts a reflow_conf_en signal toconfigure the Delay Line 840 for trimming mode operation. Whileoperating in the trim mode configuration, the signal applied to its“event input” is routed directly to the tap0. The signal applied to the“echo input” is routed such that it will travel the full length of theDelay Line 840 excluding tap0. When in this configuration the delayimage of the signal applied on the “echo input” of the Delay Line 840can be captured with the Clock Image Latch 800. Waiting a minimum numberof clock cycles, the FSM 860 asserts a trim_st2dr_en signal to capturean initial skew image of the drc_lnk_clk to std_lnk_clk clock signalswith a force_tap vector value of zero.

The Image Decode Logic 810 decodes this value to select a tap thataligns an edge of the drc_lnk_clk signal to an edge of the std_lnk_clksignal. The selected tap value is converted to a binary value by theString-to-Binary Encoder circuit 820 and passed to the DNA logic block830. The FSM 860 cycles a trim_latch2_en signal to capture the encodedbinary value of the tap to align signals drc_lnk_clk and std_lnk_clk andcomplete the gross-tuning segment.

With the completion of the gross-tuning segment, the FSM 860 begins thefine-tuning segment using the value captured during the gross-tuningsegment as the initial value. The FSM 860 increments the trimL2_selvector and cycles the trim_st2dr_en signal to capture an updatedfine-tune skew image. The FSM 860 monitors the state of a match_trm2signal and performs one of three functions. First, if the match_trm2signal remains asserted and the trimL2_sel vector does not equal itsmaximum value, the trimL2_sel vector value is incremented, and anupdated skew image is captured. Second, if the match_trm2 signal isdeactivated, the trim_latch2_en signal is cycled to save the most recentskew image of the drc_lnk_clk to std_lnk_clk clock signals to completethe trimming procedure. Third, if the match_trm2 signal remains assertedand the trimL2_sel vector equals its maximum value, the FSM 860 resetsthe trimL2_sel vector to zero to complete the trimming procedure. Oncethe trimming procedure is complete, the drc_lnk_clk to std_lnk_clk clocksignals are zero-phase-locked or 180-phase-locked. Upon completion ofthe fine-tune segment, the DSC Clock Channel FSM 860 maintains theinactive state of signal phs_adj_done (mod_clk_adj_done) shown in FIG. 8to the DSC Data Channels 610 within the DSC Module 350 and immediatelybegins the process to measure the link clock period and calculate itsquarter lambda offset value. The FSM 860 asserts a meas_strt signal tocreate a meas_pulse signal at the DSC Clock Channel 600 delay lineinput, as shown in FIG. 8. On the next clock cycle, the meas_cntl signalis asserted, capturing the clock period measurement in the Clock ImageLatch 800 and terminating the meas_pulse signal one clock cycle later,returning to the normal link clock waveform. The clock image word is anN-bit word representing the clock period of the received link clocksignal in terms of delay taps or simply “taps”. A “one-hot” decode ofthe clock image word is performed by the Image Decode Logic 810, whichlocates the tap of the first 1-to-0 transition, representing the periodof the link clock signal. The String-to-Binary Encoder circuit 820converts the “one-hot” tap location to a binary value generating thetap_decode vector shown in FIG. 8. The tap_decode vector output of theString-to-Binary Encoder circuit 820 is a binary value of the link clockperiod that is processed further by the DNA logic block 830 to generatefinal tap_adr signals. The DNA logic block 830 performs a set ofarithmetic operations on the tap_decode vector to arrive at the finaltap_adr value. As shown in FIG. 29A, in a first and second operation thebinary value of the Clock-Image-Latch distribution delay is subtractedfrom the tap_decode vector, and then the adjusted value is right-shiftedtwo positions to derive a divide-by-four value. In a third operation, abalance delay value is added to the results of the divide-by-fouroperation to produce the final tap_adr value. The balance delay value isrequired to align the edges of the drc_lnk_clk and std_lnk_clk clocksignals. Once the clock channel CTS operation has completed, the DSCClock Channel FSM 860 asserts signal phs_adj_done (mod_clk_adj_done)allowing the data channels to complete the CTS operation. The DSC DataChannels 610-0, 610-1, 610-2, 610-3, . . . shown in FIG. 6 operate asfollows during a cold training sequence (CTS). Referring to FIG. 10, theassertion of the reset_lclk signal sets the train_strt2 signal anddeactivates DSC Data Channel signal phs_adj_done shown in FIG. 10 to allof the DSC Data Channels 610-0, 610-1,610-2, 610-3, . . . in the DSCModule 350. Assertion of the train_strt2 signal launches the extendedtraining sequence (ETS) in the data channels, while the deactivation ofthe mod_clk_adj_done signal prevents the DSC Data Channels 610-0, 610-1,610-2, 610-3, . . . from completing the trimming and phase correctionprocesses.

Referring to FIG. 10, in the DSC Data Channel 610, the distributiondelay of the Data Image_Latch 1010 begins with its finite state machineor FSM 1000 setting the mode_sel[1:0] and trim_LDD[2:0] vectors. Themode_sel[1:0] vector is set to select the meas_en input of the DSC DataChannel 610 input multiplexer and while the trim_LDD[2:0] vector is setto its minimum value. Waiting a minimum number of std_lnk_clk cycles,the meas_en signals are asserted, launching a measurement ping pulsedown the DSC Data Channel 610 Delay Line 1040 while activating the DataImage Latch 1010 capturing the initial latch distribution delay image. Adna_latch_en signal is asserted by the FSM 1000 to save thebinary-encoded value in the DNA logic latch 1040. The FSM 1000 beginsthe process to refine the distribution delay image by deactivating themeas_en signals and incrementing the trim_LDD vector value to increasethe latch distribution delay by approximately one-eighth of one tapdelay. The meas_en signal is asserted once again to capture theincremented distribution delay value. A match_LDD signal is assertedwhen the binary-encoded value of the Data Image Latch 1010 equals thepervious latched value of the latch distribution delay image. The FSM1000 monitors the state of the match_LDD signal and performs one ofthree actions. First, if the match_LDD signal remains asserted and thetrim_LDD vector does not equal its maximum value, the trim_LDD vectorvalue is incremented, and the updated distribution delay image iscaptured. Second, if the match_LDD signal is deactivated, thedna_latch_en signal is cycled to save the most recent delay image valueof the distribution delay and proceeds to the next phase of the trimmingprocess. Third, if the match_LDD signal remains asserted and thetrim_LDD vector equals its maximum value, the FSM 1000 resets thetrim_LDD vector to zero and proceeds to the next phase of the trimmingprocess.

The data-recovery link clock to phase-adjusted-data path delay matchingis the final sequence of the trimming process of the DSC Data Channel610 cold training sequence. This segment of the Data Channel coldtraining sequence is gated by a Clock Channel signal mod_clk_adj_done,starting only after its assertion. Referring to FIG. 10, matching thepropagation delay delta between the drc_lnk_clk and phs_adj_data signalpaths requires a gross delay adjustment realized in the DSC Data Channel610 Delay Line 1070. Fine-tuned delay adjustments of the phs_adj_datasignal path are realized by the Trim Delay Circuit 1050 used to insertdelay and measurement using the combination of the Data Image Latch1010, Image Decode Logic 1020, String-To-Binary Encoder circuit 1030,and DNA logic block 1040.

The FSM 1000 first sets the mode_sel vector value to select thedrc_lnk_clk input of the DSC Data Channel 610 input multiplexer and theforce_tap vector to zero, selecting tap zero of the DSC Data ChannelDelay Line 1070. The FSM 1000 also sets vector trim_c2d_sel[2:0] to azero value and asserts a reflow_conf_en signal to configure the DelayLine 1070 for trimming mode operation. While operating in its trimmingmode configuration, the signal applied to the “event input” is routeddirectly to the tap0. The signal applied to the “echo input” is routedsuch that it will travel the full length of the Delay Line 1070excluding tap0. When in this configuration the delay image of the signalapplied on the “echo input” of the Delay Line 1070 can be captured withthe Data Image Latch 1010. Waiting a minimum number of clock cycles, theFSM 1000 asserts the trim_c2d_en signal to capture an initial skew imageof the drc_lnk_clk to phs_adj_data signal with a force_tap vector valueof zero.

The Image Decode Logic 1020 decodes this value to select a tap thataligns an edge of the drc_lnk_clk signal to an edge of the phs_adj_datasignal. The selected tap value is converted to a binary value by theString-to-Binary Encoder circuit 1030 and passed to the DNA logic block1040. The FSM 1000 cycles a trim_c2d_en signal to capture the encodedbinary value of the tap to align the drc_lnk_clk and phs_adj_datasignals and complete the gross-tuning segment.

With completion of the gross-tuning segment, the FSM 1000 begins thefine-tuning segment using the value captured during the gross-tuningsegment as the initial value. The FSM 1000 increments thetrim_c2d_sel[2:0] vector, then cycles the trim_c2d_en signal to capturean updated fine-tune skew image. The FSM 1000 monitors the state of amatch_vec2[2:0] signal and performs one of three actions. First, if thematch_vec2[2:0] signal remains asserted and the trim_c2d_sel[2:0] vectordoes not equal its maximum value, the trim_c2d_scl[2:0] vector value isincremented and an updated skew image is captured. Second, if thematch_vec2[2:0] signal is deactivated, the trim_c2d_en signal is cycledto save the most recent skew image of the drc_lnk_clk to phs_adj_datasignals to complete the trimming procedure. Third, if thematch_vec2[2:0] signal remains asserted, and the trim_c2d_sel[2:0]vector equals its maximum value, the FSM 1000 resets thetrim_c2d_sel[2:0] vector to zero to complete the trimming procedure.

Referring to FIG. 10, the Data Channel front end FSM 1000 begins thephase-error correction process to edge-align the input data stream withthe link clock signal. The FSM 1000 asserts the meas_en signal,capturing a data phase word in the Data Image Latch 1010 on the nextrising edge of the link clock signal std_lnk_clk. It is important tonote that the capture of the data phase word is referenced to link clockstd_lnk_clk that represents drc_lnk_clk with zero offset. Hence, thedata phase word represents the phase relationship of the data stream tothe data-recovery link clock drc_lnk_clk with an offset value of zero.The data image word is an N-bit word representing the phase error of thereceived link data signal in terms of delay taps or simply “taps”. A“one-hot” decode of the data image word is performed by the Image DecodeLogic 1020 which locates the tap position of the first 1-to-0transition. Referring to FIG. 10, the value of the Data_Image_Latch 1010bit zero position (referred to as data_image_latch[0]) is used todetermine a leading or lagging phase condition of the data image word. Adata_image_latch[0] having a value equal to one is regarded as aphase-leading condition and with the data_image_latch[0] value equal tozero indicating a phase-lagging condition. The String-to-Binary Encodercircuit 1030 converts the “one-hot” tap location to a binary value andgenerates the tap_decode vector. The tap_decode vector output of theString-to-Binary Encoder circuit 1030 is a binary value of the link dataphase error that is processed further by the DNA logic block 1040 togenerate the final tap_adr signals, as shown in FIG. 10.

For purposes of testing, the final tap_adr value can be further modifiedby adding or subtracting a registered offset value, acsr_data_ch_offset[5:0] vector. The DNA logic block 1040 performs a setof arithmetic operations on the tap_decode vector to arrive at the finaltap_adr value. As shown in FIG. 29B, in a first and second operation thebinary value of the Data-Image-Latch distribution delay is subtractedthe from the tap_decode vector, and the balance delay value that isrequired to align the edges of the drc_lnk_clk clock and phs_adj_datasignals is added to arrive at the final tap_adr value. A phs_adj_donesignal is asserted a sufficient number of clock cycles after the meas_ensignal is asserted to assure that phase-corrected data has propagatedthrough the Data Recovery String 1060 shown in FIG. 10. Assertion of thephs_adj_done signal indicates the end of the phase-correction sequenceand enables data output signals neg_data_bit and pos_data_bit to theData FIFO block 900 shown in FIG. 9.

Referring to FIG. 11, in the Data FIFO stack 1170, assertion of thetrain_strt2 signal resets the SSM Start Sequencer 1130 and activates thestd_lnk_clk. Reset of the SSM Start Sequencer 1130 also deactivates thetrain_mode signal, forcing the Write Pointer 1140 to a zero value.Hence, during a training sequence, only Data FIFO stack 1170 locationzero is selected.

As shown in FIG. 11, the neg_data_bit and pos_data_bit streams arewritten into the Data FIFO stack 1170 on the rising edge of thestd_lnk_clk signal. The Data FIFO stack 1170 is preferably implementedas an addressable “write-pre-bit” FIFO to minimize power dissipation.The FIFO write address, fifo_bit_adr, from the FIFO Address Encoder 1150is derived from the Write Pointer 1140 and Frame Bit Counter 1160 logicblocks along with the state of the pos_pattrn_flg signal. Thepos_pattrn_flg signal from the Pattern Search FSM 1120 is a logical oneif the positive bits of the Data FIFO stack 1170 all have a value oflogical one. Otherwise, the pos_pattrn_flg signal is a logical zero. TheData FIFO read and write order is altered depending on the state of thepos_pattrn_flg signal to maintain proper byte framing due to negativephase alignment of the data stream from the DSC Data Channel front-endblock 910 shown in FIG. 9. The fifo_bit_adr signal is updated with thestate of the pos_pattrn_flg signal with the deactivation of thetrain_mode signal during the SSM byte.

With the assertion of the train_mode and phs_adj_done signals, thePattern Search FSM 1120 begins SSM and training fence pattern searches.In normal operation signal phs_adj_done will be deactivated when theclock channel enters into its training sequence and reactivated once theclock channel has completed its training. During the period of thephs_adj_done signal, deactivation the Pattern Search FSM 1120 suspendsits search for the SSM byte and training fence pattern and resumes itssearch once phs_adj_done has been asserted. If after two transitions ofthe frame_inc signal the training fence pattern has not been detected,the ssm_err_fig signal is asserted. Waiting two frame_inc signaltransitions assures that the DSC Data Channel front-end block 910 hashad sufficient time to phase-adjust the data stream to be properlysampled. The SSM search continues, and the ssm_found_flg signal isasserted once the SSM pattern has been detected. If a non-training fencepattern or non-SSM patterns are detected during this time, thessm_err_flg signal is asserted, ending the SSM search. The states of thessm_found_fig and ssm_err_flg signals are maintained upon thedeactivation of the train_mode signal.

Additionally, the states of the ssm_found_flg and ssm_err_flg signalsare transmitted to the DSC Clock Channel 600 shown in FIG. 6 and usedinternally by the DSC Data Channels 610-0, 610-1, 610-2, 610-3, . . . ,as well. The DSC Clock Channel 600 processes these signals andbroadcasts the results to all DSC Data Channels 610-0, 610-1, 610-2,610-3, . . . within the DSC Module 350, as well as to other DSC Moduleswithin the bundle shown in FIG. 3. Internal to the DSC Data Channel 610,these signals are used by the SSM Start Sequencer 1130 logic shown inFIG. 11, that sets the proper byte framing.

The SSM Start Sequencer 1130 aligns and initializes the Write Pointer1140 and Frame Bit Counter 1160 to properly frame the data byte. Thisprocedure assures that the first data bit received after the trainingsequence is written to Data FIFO stack 1170 location zero. Upon theassertion of the ssm_found_flg signal, the SSM Start Sequencer 1130asserts the frame_init signal, while deactivating the frame_strt signal.The frame_init signal forces the Frame Bit Counter 1160 to a zero valueto generate a fifo_bit_adr vector of zero, to which the first data bitwill be stored. The Write Pointer 1140 and Frame Bit Counter 1160 areheld at this initialization state until the SSM Start Sequencer 1130asserts the frame_strt signal two UI from the end of the SSM byte tostart the Frame Bit Counter 1160. The SSM Start Sequencer 1130deactivates the train_mode signal at the end of the SSM byte. In thecase of an SSM error, the ssm_err_flg signal is asserted and used aspart of a process to observe error-framed data where the DSC DataChannel 610 outputs data although it is incorrectly framed, allowing thesystem to diagnose the error condition.

Logic to generate the fifo_bit_adr vector comprises the Write Pointer1140, Frame Bit Counter 1160, and FIFO Address Encoder 1150 logicblocks. The Data FIFO stack 1170, as previously mentioned, is preferablyimplemented as a “write-per-bit”, where each bit of data is written to aparticular bit location pointed to by the fifo_bit_adr vector. Theframe_adr signal and a wr_pntr_adr vector are used by the FIFO AddressEncoder 1150 to generate the fifo_bit_adr vector. The Frame Bit Counter1160 is sized for two-byte count intervals, which aligns double frameson an integer number of link clock cycles. Although not required forfifo_bit_adr vector generation, the write pointer is sized to match theread pointer, which reduces the logic needed for Data FIFO statuscalculations. The least significant portion (or LSB) of the wr_pntr_adrvector is not used by the FIFO Address Encoder 1150 to generate thefifo_bit_adr vector, thus requiring the write pointer to be incrementedtwice. This requires the Frame Bit Counter 1160 to assert a frame_incsignal to increment the Write Pointer 1140 value on each byte timeinterval. It is noted that intermediate increment values of the writepointer do not affect the fifo_bit_adr vector, but they allowread-to-write pointer tracking. Configuration of the Write Pointer 1140and Frame Bit Counter 1160 in this way accommodates odd-numberedencoding schemes (for example, 8b9b) with reduced design complexity.

The DSC Clock Channel 600 receives the ssm_found_flg and ssm_err_flgsignals from all DSC Data Channels 610-0, 610-1, 610-2, 610-3, . . .within the DSC Module 350. As shown in FIG. 26, a bim_all_ssm_found_flgsignal is the logical “AND” of all ssm_found_flg signals from the DSCData Channels 610-0,610-1,610-2, 610-3, . . . and synchronized to thecore clock domain before being broadcast to all DSC Modules 350-0,350-1, . . . ,350-N within the bundle via the BIM 360. A bim_ssm_err_flgsignal shown in FIG. 26 is the logical “OR” of all ssm_err_flg signalsfrom the DSC Data Channels 610-0, 610-1, 610-2,610-3, . . . and issynchronized to the core clock domain before being broadcast to all DSCModules 350-0, 350-1, . . . , 350-N within the bundle via the BIM 360.Each DSC Module 350-0, 350-1, . . . , 350-N in the bundle receives thebroadcast signals bim_ssm_err_flg and bim_all_ssm_found by the utilityblock of the DSC Clock Channel 600 and generates a mod_rd_cnt_enb signalthat is broadcast to all DSC Data Channels 610-0, 610-1, 610-2, 610-3, .. . . in the DSC Module 350, as shown in FIG. 27. The mod_rd_cnt_enbsignal shown in FIG. 27 is the logical “OR” of bim_ssm_err_flg andbim_all_ssm_found signals shown in FIG. 14. The assertion of themod_rd_cnt_enb signal marks the end of the training sequence and thestart of normal operation.

The Bundle Interface Module (BIM) 360 shown in FIG. 3 operates asfollows during a cold training sequence (CTS). The BIM 360 provides aninterconnect path for the DSC Module high-speed (link-speed) andlow-speed (core-speed) signals. Signals classified as core speed aresimply buffered and are presumed to settle within the period of the coreclock cycle. Link-speed signals are also buffered but require settlingtimes of multiple link-clock cycles. Hence, for link-speed signals, aBIM 360 calibration sequence is required to properly sample andsynchronize link-speed signals. This operation is performed only duringCTS operations immediately after a link reset and is completed duringthe DSC Clock Channel CTS trimming segment. Calibration of the BIM 360is completely transparent to all functions of the DSC Module 350.

The extended training sequence (ETS) generally operates as follows. TheETS is a super-set of the warm training sequence (WTS) that includes thetrimming procedure of the cold training sequence. The purpose of the ETSoperation is to match/calibrate delay paths of the clock and DSC DataChannels 610 without initiating a link reset, thus preventing data loss.

ETS operations are initiated at the source node of the transmitter 300with the SSD Module 310 transmitting an ETS command on all link datachannels 320. The ETS command has a minimum number of consecutivecommand bytes (typically four bytes), that is a unique pattern not foundin the normal data stream. Other methods, such as “byte count”, may alsobe used to initiate an ETS command. A minimal length or byte counterror-free for the ETS command is required to assure that random biterrors cannot initiate a false training sequence. Once started, ETSoperations follow the CTS operations.

In summary, the method of the present invention comprises a unique setof steps or procedures for deskewing a circuit. A series of threeprocedures that directly and explicitly deskew a circuit is provided.The steps are referred to as: 1) a cold training sequence (CTS); 2) awarm training sequence (WTS); and 3) an extended training sequence(ETS). These steps have proven successful to deskew a circuit.

Preferably, the method of the present invention includes the coldtraining sequence (CTS). A timing diagram for the CTS is shown in FIGS.24A and 24B.

The method of the present invention also includes the warm trainingsequence (WTS). A timing diagram for the WTS is shown in FIG. 23.

By the end of training provided by the method of the present invention,a circuit is deskewed. This enables the circuit to effectively readdata.

The above describes the preferred dynamic skew compensation architectureand method of operation. Two alternate architectures and methods aredescribed below.

A first alternate architecture with a base assumption of a non-modifiedlink clock requires that a sampling offset value be added to the datachannel phase-adjust value to obtain the proper sampling point. Toachieve this, the phase measurement path and normal data path must haveidentical equal delays. Hence, those path elements that are present inthe data path but not found in the phase measurement path must bereplicated to perform a delay cancellation function.

There are two main advantages to this first alternate architecture. Thefirst is a straightforward conceptual understanding. The secondadvantage is that the link clock is passed throughout the DSC Module 350and bundle uninterrupted.

On the other hand, a first disadvantage of this architecture is thatreplication errors of both value and “tracking” over the desiredoperating range are introduced into all measurements, affecting thefinal placement of the sampling edge. A second disadvantage is that ifthe path element to be replicated is large, this will significantlyincrease the area required to implement the DSC Module 350. A thirddisadvantage is that this architecture has a greater sensitivity tomanufacturing variation Across Chip Length Variation (ACLV), aphenomenon that introduces performance differences between identicalcircuits located at different locations. A forth disadvantage is thatreplicated circuits included in either the phase measurement or datapaths must meet the design criteria of these paths. This forces thosecircuits being replicated to meet a design criterion that is muchgreater than what would be normally required for that particular circuitor function. As a result, an increase in circuit area, powerdissipation, and design effort is needed for this architecture.

A second alternate architecture is based on a sweep of the sample pointwhile monitoring the state of the sampling flip-flops' outputs and alsoassumes a non-modified link clock. This architecture is designed toaddress the timing error introduced by implementing replicated pathelements and timing effects introduced by ACLV.

There are two main advantages to this second alternate architecture withthe first being the elimination of the image latches and replicatedlogic, thereby reducing the area needed. A second advantage is that itdirectly addresses and has a method to compensate for ACLV timingeffects.

On the other hand, a first disadvantage is that the determination of thesampling point is based on detecting the metastable point of thesampling flip-flop. Further, it depends on the accurate modeling of thesampling flip-flops' metastable behavior. Near a metastable point,behavior must be considered which may significantly affect the samplepoint selected. A second disadvantage is that the warm training sequenceis significantly increased due in part to the metastable recovery timeand the sample point search algorithm. A third potential disadvantage isthe significant increase in the logic and area required to implement thesearch algorithm. However, this increase may be offset by theelimination of the measurement latch.

Although the present invention has been described with a particulardegree of specificity with reference to various embodiments, it shouldbe understood that numerous changes both in the form and steps disclosedcan be made without departing from the spirit of the invention. Thescope of protection sought is to be limited only by the scope of theappended claims that are intended to suitably cover the invention.

1. A method to dynamically compensate for skew across a plurality ofparallel data channels sharing a common clock channel by performing acold training sequence, comprising the steps of: a. sending a trainingsequence from a source to a receiver across the parallel data channels,the training sequence having a link-reset signature, fence pattern, andsource synchronous marker (SSM) byte; b. detecting the link-resetsignature for a minimum specified amount of time to assure detection inall data channels; c. performing delay path calibration for the clockchannel; d. performing link skew compensation; and e. in accordance withthe fence pattern, providing word alignment using the SSM byte of thetraining sequence.
 2. The method of claim 1 wherein the plurality ofparallel data channels is scalable.
 3. The method of claim 1 wherein thelink-reset signature is produced by stopping the link clock switching byholding the link clock signal at either a logical one or logical zerostate.
 4. The method of claim 1 wherein performing link skewcompensation consists of a warm training sequence comprising the stepsof: a. sending a second training sequence having a command sub-sequence,second fence pattern, and second SSM byte; and b. in accordance with thesecond fence pattern, providing word alignment using the second SSMbyte.
 5. The method of claim 1, further comprising the steps of:detecting a system power-up or reset or unrecoverable link error; andinitiating the cold training sequence without the capability tocorrectly sample or frame an input data stream after system power-up orreset or unrecoverable link error.
 6. The method of claim 1, furthercomprising the steps of: a. causing a link reset such that the datachannels are reset and all calibration values obtained during a previoustraining sequence are cleared; and b. immediately initiating a secondtraining sequence.
 7. A method to dynamically compensate for skew acrossa plurality of parallel data channels sharing a common clock channel byperforming a warn training sequence, comprising the steps of: a. sendinga training sequence having a command sequence, fence pattern, and sourcesynchronous marker (SSM) byte; b. in accordance with the fence pattern,providing word alignment using the SSM byte of the training sequence;and c. in accordance with the command sequence, sending multiple commandbytes; thereby reducing the probability of false or split trainingsequences due to multiple bit errors.
 8. A method to perform dynamicskew compensation across a plurality of parallel data channels sharing acommon clock channel without interruption of the clock channel signaland preventing loss of data, comprising the steps of: a. measuring theclock period of the clock channel signal; b. determining a clock offsetvalue to properly sample received data from each data channel; c.performing phase correction of all data channels to the common clockchannel by aligning any data edge to any clock edge; d. identifying databit zero phase alignment; and e. adjusting a write location to maintainproper bit ordering.
 9. A system to deskew a parallel data link having aplurality of channels for exchanging digital data, the link comprising:source and destination nodes with an interconnect medium there between;a Source Synchronous Driver (SSD) at the link source node to format “M”bits of input data received from core logic and to drive “M” datachannels onto the link along with a link clock; a Dynamic SkewCompensation (DSC) architectural block at the link destination node toreceive the “M” data bits and link clock and to compensate for skew,recenter the link clock edge relative to the bits of data, and output“M” bits of data, the DSC block comprising: a DSC bundle consisting of aplurality of DCS Modules interconnected with a Bundle Interface Module,wherein the DSC Modules perform adjustments to compensate forchannel-to-channel skew and substantially center the link clock edgewith respect to the data bits.
 10. The system of claim 9 wherein thedigital data bits are exchanged as pulses of electrical energy overelectrically conductive material.
 11. The system of claim 9 wherein thedigital data bits are exchanged as pulses of light over optic fiber. 12.The system of claim 9 wherein each DSC Module comprises a plurality ofDSC Data Channels and a DSC Clock Channel. 13-14. (canceled)
 15. Thesystem of claim 12 wherein each DSC Data Channel comprises a DataChannel Front-End block, Data FIFO, and utility block.
 16. The system ofclaim 15 wherein the Data Channel Front-End block comprises a finitestate machine, Data Image Latch, Image Decode Logic, String-to-BinaryEncoder circuit, DNA logic block, Delay Line, Trim Delay Circuit, andglue logic.
 17. The system of claim 15 wherein the Data FIFO comprises aPattern Search finite state machine, Data FIFO Register File, FIFOAddress Encoder, Write Pointer, Read Pointer, Frame Bit Counter, andSkew Synchronizing Marker Start Sequencer.
 18. The system of claim 17wherein the Data FIFO Register File is a write-per-bit FIFO.
 19. Thesystem of claim 12 wherein the DSC bundle performs a cold trainingsequence.
 20. The system of claim 12 wherein the DSC bundle performs awarm training sequence.